Missing value estimation methods for DNA microarrays
نویسندگان
چکیده
MOTIVATION Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. RESULTS We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.
منابع مشابه
Robust SVD Method for Missing Value Estimation of DNA Microarrays
A majority of DNA microarray datasets contain missing or corrupt values and it is critical to estimate these values accurately. These missing values are most often attributed to insufficient experimental resolution or the presence of foreign objects on the experimental slide’s surface. To improve existing missing value estimation algorithms, this paper introduces and investigates the scalable s...
متن کاملCollateral Missing Value Estimation: Robust Missing Value Estimation for Consequent Microarray Data Processing
Microarrays have unique ability to probe thousands of genes at a time that makes it a useful tool for variety of applications, ranging from diagnosis to drug discovery. However, data generated by microarrays often contains multiple missing gene expressions that affect the subsequent analysis, as most of the times these missing values are ignored. In this paper we have analyzed how accurate esti...
متن کاملA Simultaneous Reconstruction of Missing Data in DNA Microarrays
We suggest here a new method of the estimation of missing entries in a gene expression matrix, which is done simultaneously— i.e., the estimation of one missing entry influences the estimation of other entries. Our method is closely related to the methods and techniques used for solving inverse eigenvalue problems. 2000 Mathematical Subject Classification: 15A18, 92D10
متن کاملHeuristic Non Parametric Collateral Missing Value Imputation: A Step Towards Robust Post-genomic Knowledge Discovery
Microarrays are able to measure the patterns of expression of thousands of genes in a genome to give profiles that facilitate much faster analysis of biological processes for diagnosis, prognosis and tailored drug discovery. Microarrays, however, commonly have missing values which can result in erroneous downstream analysis. To impute these missing values, various algorithms have been proposed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 17 6 شماره
صفحات -
تاریخ انتشار 2001